Data

This is the combined dataset from /library/s4_lectures/3_network_science/03_02_mentorship_network_paths.ipynb (in notebook its final name is connect_names).

Codebook:

CID unique identifier for each connection

MenteeID unique identifier of the trainee.

MentorID unique identifier of the mentor.

MentorshipType integer coding the type of relationship: (What does “4” mean?)

  • 0=undergrad research assistant,

  • 1=graduate student,

  • 2=postdoctoral fellow,

  • 3=research scientist.

Institution string name of institution where training took place (it’s raw, is there a solution on zenodo.org?)

StopYear year of graduation/training completed (what is -1?)

gender_t, gender_m gender by first name (using the dataset available at zenodo.org)

ResearchArea_t, ResearchArea_m- first research area (from the full list of each person’s areas, only the first is taken)

## Rows: 742,766
## Columns: 10
## $ CID            <dbl> 2, 3, 5, 17, 18, 19, 25, 36, 44, 58, 106, 105, 111, 1, …
## $ MenteeID       <dbl> 2, 4, 6, 27, 28, 8, 5, 17, 7, 60, 33, 105, 108, 1, 521,…
## $ MentorID       <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
## $ MentorshipType <chr> "1=graduate student", "2=postdoctoral fellow", "1=gradu…
## $ Institution    <chr> "University of California, Berkeley", "University of Ca…
## $ StopYear       <dbl> 2005, 2006, 2008, -1, -1, 2006, 2009, 2002, 2004, 2007,…
## $ ResearchArea_t <chr> "neuro", "neuro", "neuro", "neuro", "neuro", "neuro", "…
## $ ResearchArea_m <chr> "neuro", "neuro", "neuro", "neuro", "neuro", "neuro", "…
## $ gender_t       <chr> "man", "man", "man", "man", "woman", "woman", "unknown"…
## $ gender_m       <chr> "man", "man", "man", "man", "man", "man", "man", "man",…

Areas

Figure 1: Gender structure by main areas

Mentors

Trainees

Years

Figure 2: Count mentors by years

Total

By main areas


Mentorship Type

Table 1:

Overall
(N=742766)
as.character(MentorshipType)
0=undergrad research assistant 18838 (2.5%)
1=graduate student 630099 (84.8%)
2=postdoctoral fellow 68618 (9.2%)
3=research scientist 7402 (1.0%)
4=? 17809 (2.4%)


Figure 2:


Reflection: what is “family”?

Family 1. Single parent with child

Family 2. Single parent: large family with twins

Family 3. Two parents with a child

Family 4. Mixed, with stepparents/stepbrothers/stepsisters

Family 5. …

Conditions trainee entering the “Family”:

  • through the “parent”(mentor) - necessary condition

  • through the overlapping study period with other trainees, OR/AND trainees should have the same area (?) OR/AND trainees’ co-publications during the study with one mentor and n years after StopYear (?)

Conditions then mentor entering the “Family”(?) - need to look at reverse logic?



First look on posssible types of “families”

How many mantors have a trainee

We have several cases with high numbers of mentors (max 7 mentors).

Table 2: Trainees (only graduate students) and their menthors (top 1000 entries)

Figure 3: Ugly graph, sorry, but explicitly shows that one mentor is the major case

About 98.2% of trainees have one mentor and 1.6% of trainees have two mentors.

How many trainees have a mentor

Table 3: 78.4% of mentors (graduate MentorshipType only) had one trainee in particular StopYear

Overall
(N=375545)
as.factor(Mentee_count_type)
1 294452 (78.4%)
2 57802 (15.4%)
3 15142 (4.0%)
4 4669 (1.2%)
5 and more 3480 (0.9%)

Figure 4: One trainee in a particular StopYear is the main case

Figure 5: And again a story about the structure of the data by year

Figure 6: How mentors and trainees are matched